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Abstract 

We  present  highly  efficient  parallel  algorithms  for  several  well-studied  dictionary 
matching  problems.  Our  algorithms  are  faster  and  more  efficient  in  terms  of  their 
parallel  work,  compared  to  previously  known  results. 

•  For  static  dictionary  matching,  we  present  an  algorithm  that  preprocesses  the 
dictionary  and  matches  the  text  in  O(logm)  parallel  time  and  0(M  +  n  log  to) 
work,  given  any  dictionary  of  size  M  whose  longest  pattern  is  m  characters 
long,  and  a  text  of  size  n.  We  have  further  improved  this  algorithm  to  solve 
static  dictionary  matching  with  only  0((M  +  n)y/  log  to)  work,  if  the  characters 
are  drawn  from  an  alphabet  of  constant  size.  A  distinguishing  feature  of  these 
algorithms  and  the  one  stated  below  for  matching  in  higher  dimensions,  is  that 
in  contrast  with  previous  work,  the  running  times,  and  work  overheads  when 
applicable,  are  dependent  only  on  the  length  of  the  longest  pattern  m. 

•  We  present  a  parallel  algorithm  for  d-dimensional  dictionary  matching  that  runs 
in  0(log  to )  time  and  matches  the  text  in  0(M  +  n  log  m )  work  for  any  fixed  d. 

•  We  present  a  new  and  more  efficient  parallel  algorithm  for  dynamic  dictionary 
matching.  Insertions  into  and  deletions  from  the  dictionary,  as  well  as  matching 
the  text  can  be  done  with  optimal  speedup  in  O { A  log  M )  work  and  O(logM) 
time.  Here,  A  denotes  the  length  of  the  string  to  be  inserted,  deleted  or  matched 
into  a  dictionary  of  size  M. 

All  of  the  above  algorithms  are  designed  by  applying  the  shrink- and- spawn  tech¬ 
nique  that  we  introduce  in  this  paper.  We  also  show  that  this  technique  leads  to 
parallel  algorithms  that  only  do  optimal  (linear)  work,  for  multi-dimensional  pattern 
matching  and  related  problems  [KLP89,Rab93].  Our  algorithms  are  deterministic,  as 
those  in  [KLP89],  but  however,  are  much  simpler  and  preserve  the  efficiency  as  well 
as  the  speed  of  those  presented  there. 
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1  Introduction 


The  input  to  the  dictionary  pattern  matching  problem  is  a  union  of  distinct  pattern  strings 
represented  as  a  dictionary  2?,  and  a  text  string  T.  The  goal  is  to  find  for  each  location  in 
the  text,  all  the  patterns  from  the  dictionary  that  match  at  that  location.  The  classical 
sequential  algorithm  for  this  problem  is  due  to  Aho  and  Corasick  [AC75]  that  runs  in  time 
0{n  +  AT),  where  n  and  M  respectively  denote  the  sizes  of  the  text  and  the  dictionary.1 
The  dictionary  T>  is  assumed  to  be  presented  statically  in  the  beginning,  with  the  text 
string — or  more  generally  a  sequence  of  text  strings — presented  subsequently.  Aho  and 
Corasick  preprocess  the  patterns  in  T>  and  construct  a  tree  (trie).  This  tree  encodes  a 
generalization  of  the  well-known  “failure”  and  “go-to”  functions  introduced  by  Ivnuth, 
Morris  and  Pratt  [KMP77]  in  the  context  of  string  matching ,  wherein  the  dictionary  con¬ 
sists  of  a  single  pattern.  Unfortunately,  these  approaches  seem  to  be  inherently  sequential 
and  are  not  amenable  to  efficient  parallelization. 

The  best  known  deterministic  parallel  algorithm  for  this  problem,  referred  to  hence¬ 
forth  as  static  dictionary  matching ,  is  due  to  Amir  and  Farach  [AF91].  Their  algorithm 
runs  in  O(logmlogAT)  parallel  time  and  0((M  +  n  log  m)  log  AT)  work ,  where  m  denotes 
the  length  of  the  longest  pattern  in  22  Using  randomization,  Amir,  Farach  and  Mafias 
[AFM92]  have  reported  an  algorithm  with  improved  time  and  work  bounds.  Their  al¬ 
gorithm  runs  in  O(log  M)  expected  time  and  performs  0((AT  +  ro)log  AT)  work.  (Given 
an  input  of  size  IV,  a  parallel  algorithm  running  in  time  T(N)  using  P{N)  processors 
performs  work  P(N).T(N).  A  parallel  algorithm  has  optimal  speedup  whenever  it  does 
work  that  is  asymptotically  the  same  as  the  best-known  sequential  algorithm  that  runs 
in  seq(N)  steps2  i.e.,  P(N).T(N )  =  0(seq(N)).) 

Previously  known  parallel  algorithms  for  this  and  other  well-studied  dictionary  pattern 
matching  variants  [AF91,AFGGP91,AFILS93,Gi93]  are  not  as  efficient  as  their  sequential 
counterparts,  in  that  they  achieve  parallel  speedup  at  the  expense  of  performing  more 
work.  Additionally,  owing  to  the  suffi,x  tree  constructions  used  previously  [AF91,GG93], 
the  running  times  and  the  work  overhead  incurred  in  parallelizing  depended  on  the  size 
of  the  entire  dictionary  (AT  in  the  case  of  static  dictionary  matching),  which  can  be 
prohibitively  large. 

In  this  paper,  we  present  highly  efficient  algorithms  for  a  range  of  dictionary  match¬ 
ing  problems,  including  the  ones  discussed  above.  We  do  this  by  introducing  a  general 
shrink- and- spawn  technique  that  we  apply  repeatedly  to  design  our  algorithms. 

1.1  Main  results  and  Significance 

1.  For  static  dictionary  matching,  we  present  an  algorithm  that  preprocesses  the  dic¬ 
tionary  and  matches  the  text  in  time  O(logm).  Its  overall  work  complexity  is 
0(M  +  nlogm).  If  the  text  and  patterns  are  derived  from  a  constant-sized  alpha- 

1This  bound  holds  for  an  alphabet  size  that  is  polynomial  in  n  and  M . 

2For  a  sequential  algorithm,  its  work  and  running  time  are  equivalent. 
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bet,  then  we  improve  the  above  algorithm  to  solve  static  dictionary  matching  in  time 
O(log??r)  with  only  0((M  +  n)\/  logm)  work.  (Whenever  convenient,  we  will  use 
the  first  and  second  terms  in  the  expressions  denoting  time  and  work  complexities 
to  correspond  to  dictionary  and  text  processing  respectively.) 

We  note  that  these  bounds  are  better  than  those  for  the  best-known  parallel  algo¬ 
rithms  for  the  above  problem,  even  when  randomization  is  used.  Because  of  the 
manner  in  which  the  shrink-and-spawn  technique  works  based  on  “naming”  (to  be 
described  later),  the  running  times  and  work  overheads  of  this  algorithm  and  its 
extension  to  dictionary  matching  in  higher  dimensions  (item  2  below),  are  depen¬ 
dent  only  on  the  length  of  the  longest  pattern  m  rather  than  on  M .  the  total  size 
of  the  dictionary. 

2.  We  extend  the  above  algorithm  to  run  in  O(logm)  time  and  0(M  +  nlog??i)  work 
for  dictionary  matching  when  the  patterns  and  text  are  d-dimensional,  for  any  fixed 

d. 

No  deterministic  parallel  algorithms  were  known  previously,  and  the  most  efficient 
sequential  algorithm  from  [AF92]  runs  in  0(M  +  n)  time  using  quadratic  space, 
and  in  0((M  +  n)  log/c)  time  using  linear  space;  k  denotes  the  number  of  patterns 
in  the  dictionary. 

3.  In  the  dynamic  dictionary  matching  problem,  we  start  with  an  initial  dictionary 
T>  and  execute  a  sequence  of  insert ,  delete  or  match  operations  that  are  specified 
on-line.  Whereas  insertions  and  deletions  are  meant  to  respectively  add  or  delete  a 
given  pattern  from  the  “current  dictionary”3  of  size  M,  a  match  operation  is  meant 
to  find  all  occurrences  of  patterns  from  it,  in  a  given  piece  of  text.  We  will  use  A  to 
denote  the  sizes  of  the  patterns  to  be  inserted  or  deleted,  and  that  of  the  text  to  be 
matched.  Amir  and  Farach  [AF91]  distinguish  the  partly  dynamic  version  of  this 
problem  where  only  insert  and  match  operations  are  allowed,  from  its  fully  dynamic 
variant  where  deletions  are  allowed  as  well. 

a.  We  present  a  parallel  algorithm  for  partly  dynamic  dictionary  matching  that 
has  optimal  speedup  for  insertions  and  matching.  Both  these  operations  take 
O(AlogM)  work  and  O(logM)  time. 

In  [AF91],  Amir  and  Farach  present  a  parallel  algorithm  that  does  not  have  op¬ 
timal  speedup  for  matching  the  text,  since  it  requires  0( A  log  m  log  M )  work  to 
implement  this  step.  Also,  insertions  and  text  matching  take  O(log  m  log  Af) 
time  and  hence  are  slower  than  the  corresponding  running  times  of  our  algo¬ 
rithm.  The  best  sequential  algorithm  for  insertions  and  matching  is  also  due 
to  Amir  and  Farach  [AF91]  and  runs  in  0(Alog  M)  time. 

b.  For  fully  dynamic  dictionary  matching,  we  present  an  optimal  speedup  par¬ 
allel  algorithm  that  implements  the  delete  operation  in  O(log  M)  time  and 

3Each  operation  is  defined  on  the  dictionary  that  exists  after  all  the  other  operations  preceding  it  from 
the  given  sequence  have  been  applied  to  V,  in  the  order  specified. 
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O(AlogM)  amortized  work.  Our  parallel  running  time  and  work  performed 
for  insertions  and  text-matching  are  identical  to  those  for  partly  dynamic  dic¬ 
tionary  matching  stated  above. 

No  deterministic  parallel  algorithms  are  known  for  this  problem.  The  best- 
known  sequential  algorithm  for  implementing  deletions  runs  in  0(A  log  M) 
time  [AFGGP91]. 

4.  Prefix-matching  that  we  characterize  in  this  paper  plays  a  significant  role  in  our 
approach  to  designing  parallel  algorithms  for  dictionary  matching.  Our  main  step 
is  to  design  extremely  efficient  parallel  algorithms  for  prefix-matching  (Theorems 
1,  7,  and  9),  which  we  then  use  to  achieve  the  above-mentioned  improvements 
for  dictionary  matching.  This  approach  works  since  prefix-matching  embodied  the 
bottlenecks  in  previously  known  algorithms  for  parallel  dictionary  matching. 

5.  For  the  multi- dimensional  pattern  matching  problem  (and  related  problems)  from 
[KLP89,Rab93],  we  present  parallel  algorithms  with  optimal  speedup.  In  multi¬ 
dimensional  pattern  matching,  we  are  given  a  pattern  of  size  M  and  a  text  of  size 
n,  both  of  which  are  cubes  in  {/-dimensions.  By  applying  the  shrink-and-spawn 
technique,  we  derive  an  optimal  speedup  parallel  algorithm  for  this  problem  that 
runs  in  O(logm)  time  and  0(n  +  M )  work.  Here,  m  =  M 1!d  denotes  the  number 
of  characters  in  each  side  of  the  pattern. 

Kedem,  Landau  and  Palern  [KLP89]  were  the  first  to  present  an  optimal  speedup 
parallel  algorithm  for  this,  and  related  problems.  Rabin  [Rab93]  presented  elegant 
randomized  algorithms  for  these  problems — also  with  optimal  speedup — based  on 
fingerprinting.  It  is  interesting  to  note  that  by  using  the  shrink-and-spawn  tech¬ 
nique,  we  are  able  to  derive  deterministic  algorithms  that  are  much  simpler  than 
those  from  [KLP89]4  while  preserving  optimal  speedup,  even  when  the  overall  work 
is  linear  in  the  input  size. 

Traditionally,  the  approach  to  designing  very  efficient  and  optimal  speedup  parallel 
algorithms  for  string  and  pattern  matching  problems  have  relied  on  the  notion  of  pe¬ 
riodicities  [Ga84,Vi85,BG90,Vi90,Ga92,AB92,ABF93,CGRMR92].  Unfortunately,  these 
methods  do  not  seem  to  scale  well  beyond  two  dimensions,  or  when  multiple  patterns  are 
given.  Therefore,  Kedem,  Landau  and  Palern  [KLP89]  and  Rabin  [Rab93]  approach  the 
problem  of  matching  in  higher  dimensions  very  differently.  Their  methods  and  those  of 
Apostolico  et  al.  [AILSV88]  for  parallel  construction  of  suffix  trees,  are  inspired  by  the 
naming  technique  of  Karp,  Miller  and  Rosenberg  [KMR72].  Naming  involves  successively 
refining  the  given  set  of  strings  into  equivalence  classes  of  increasing  size.  All  the  strings 
in  a  given  equivalence  class  are  identical,  and  are  given  a  unique  name  or  “certificate”. 
Currently  known  parallel  algorithms  based  on  these  naming  constructs,  from  [KLP89] 

4Their  algorithm  uses  a  sophisticated  parallel  construction  and  simulation  of  the  Aho-Corasick  [AC75] 
automaton  which  we  do  not  need. 
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and  [Rab93],  lead  to  efficient  parallel  algorithms  only  when  the  patterns  are  of  equal 
length. 

In  order  to  cope  with  the  more  general  and  demanding  situation  wherein  the  dictio¬ 
nary  consists  of  patterns  of  unequal  length,  we  introduce  the  shrink-and-spawn  technique 
that  builds  on  the  above-mentioned  naming  techniques.  To  better  understand  our  tech¬ 
nique,  let  us  consider  the  following  oversimplified  yet  illustrative  example.  We  are  given 
two  strings  A  and  B  of  lengths  a  and  j3  respectively.  The  goal  is  to  find  all  occurrences 
of  string  A  in  B.  Let  L  be  the  shrink  (and  spawn)  parameter.  During  the  shrinking  step, 
A  is  decomposed  into  non-overlapping  sub-strings  of  size  L  and  each  of  these  sub-strings 
are  given  names  as  in  [KMR72].5  The  resulting  shrunken  string  A!  of  length  a/ L  is  sim¬ 
ply  the  ordered  composition  of  names  given  to  locations  (1,  L.  2 T, .  .  .).  Now  in  string  B, 
we  replace  each  symbol  with  the  name  of  the  substring  of  L  characters  starting  at  that 
position,  with  the  same  naming  function  used  in  the  context  of  A.  By  doing  this,  we 
spawn  L  copies  from  B  each  of  length  f3/ L.  Copy  i  is  derived  by  composing  the  names 
given  to  locations  (?’,  i  +  T,  i  +  2 T, .  .  . ).  By  executing  this  step  once,  we  have  effectively 
reduced  the  size  of  one  of  the  strings  (A  in  this  case)  by  a  factor  L  without  losing  any 
of  the  information  needed  for  matching.  This  is  because  in  order  to  find  matches  in 
0,  we  need  to  essentially  consider  finding  matches  of  the  (smaller)  string  A!  in  each  of 
the  spawned  copies  of  B.  The  overall  technique  involves  applying  this  shrink-and-spawn 
step  repeatedly ,  with  appropriate  choices  of  the  parameter  L.  In  [KLP89]  this  technique 
was  implicitly  used  for  L  =  log  a  exactly  owce,  to  decrease  the  size  of  one  of  the  strings 
initially  by  a  log  a  factor. 

The  rest  of  the  paper  is  organized  as  follows:  in  Section  3,  we  define  the  basic  tech¬ 
niques  used  by  our  algorithms.  Our  algorithms  for  static  dictionary  matching  are  de¬ 
scribed  in  detail,  in  Section  4.  The  main  ideas  in  our  approach  to  solving  dictionary 
matching  problems  are  highlighted  in  this  section.  In  section  5,  we  sketch  the  extensions 
of  these  algorithms  to  higher  dimensional  dictionary  matching.  The  modifications  and 
extensions  needed  to  cope  with  dynamic  dictionaries  are  outlined  in  section  6.  Finally,  we 
briefly  mention  the  optimal  speed-up  algorithms  for  problems  including  multi- dimensional 
pattern  matching  (from  [KLP89])  in  Section  7. 


2  Model,  Alphabet  Size  and  Remarks 

All  our  algorithms  are  designed  using  the  arbitrary  CRCW  PRAM  model[Ja92].  As  in 
[AILSV88]  and  [KLP89],  we  are  concerned  with  an  alphabet  size  that  is  polynomial  in  n 
and  M.  All  the  bounds  quoted  thus  far — including  those  for  previously  known  algorithms 
[AF91] — are  in  the  context  of  this  alphabet  size.  If  the  input  alphabet  is  unbounded,  all 
known  sequential  and  parallel  algorithms  for  dictionary  matching  including  that  due  to 
i\ho  and  Corasick  [AC75]  perform  Q((n  +  M)logM)  work. 

5We  assume  that  a  and  /?  are  multiples  of  L  for  ease  of  explanation;  this  is  not  a  requirement  in  applying 
the  technique  itself. 
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Our  algorithms  use  up  to  m  tables  of  size  M‘ 2  each.  All  of  our  techniques  will  work  with 
space  0(M1+e)  for  some  e  >  0  per  table,  as  in  the  work  of  Apostolico  et  al.  [AILSV88]. 
Our  work  bounds  include  the  cost  for  initializing  these  tables  based  on  the  methods 
of  Hagerup  [H88] .  Randomization  [GMV91]  can  be  used  to  decrease  the  above  space 
requirements  substantially. 

Parallel  algorithms  for  dictionary  matching  specify  the  output  by  listing  for  each  text 
location,  the  longest  pattern  from  the  dictionary  that  matches  there.  An  alternate  output 
format  that  is  typically  used  in  the  sequential  case  is  to  list  for  each  text  location,  all 
the  patterns  that  match  there;  this  results  in  an  output-bound  computation.  Should  this 
format  be  required  in  the  parallel  setting,  even  for  static  dictionary  matching,  the  interval 
allocation  problem  [H92]  seems  to  be  inherent.  Indeed,  given  the  output  of  our  algorithm 
for  static  dictionary  matching,  the  algorithm  of  Hagerup  [H93]  for  interval  allocation  can 
be  used  to  output  for  each  text  location,  all  the  patterns  that  match  at  that  location; 
Hagerup’s  algorithm  takes  O(loglog3n)  time  and  linear  work. 

For  dynamic  pattern  matching,  our  algorithms  and  those  due  to  Amir  and  Farach 
[AF91]  process  the  initial  dictionary  in  O(log  |'P|)  time  and  0(|Z>|  log(|X)|))  work.  We 
remark  that  Idury  and  Schaffer  [IS91]  have  improved  the  sequential  running  time  of  the 
(initial)  dictionary  processing  step  to  0(|£>|)  using  quadratic  space.  Also,  randomized 
algorithms  for  dynamic  and  higher  dimensional  dictionary  matching  can  be  found  in 
[AFM92], 

3  Basic  Primitives 

We  introduce  three  important  operations  used  in  this  paper. 


3.1  Shrink- And-Spawn 

Before  defining  this  operation,  we  need  to  define  the  following  additional  primitive. 

Naming 

Input:  A  set  S  of  strings  of  length  l. 

Output:  For  each  £  S,  a  O(log  |5|)  bit  name  denoted  by  8{si)  such  that  6(si)  =  6(s2) 
for  sl5  s2  £  S  if  and  only  if  =  s2.  The  function  8  is  a  naming  function. 

The  function  8  is  a  naming  function.  For  a  given  set,  there  exist  several  naming 
functions.  In  our  applications,  it  is  sufficient  that  we  find  one  naming  function.  Using 
naming  we  now  define, 

Shrink-and-Spawn 

Input:  Two  strings  U  =  11^112  ■  ■  .  un  and  V  =  rqrq  •  •  •  vm  and  a  parameter  l  that  divides 

m. 
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Output:  For  a  naming  function  6,  two  sets  of  strings  U'  and  V  described  as  follows.  Set 
V  contains  the  string  S(Vi  .  .  .  Vi)5(Vi+ 1  .  .  .  V2i)  .  .  .  8(Vm-i+ 1  .  .  .  Vm).  Set  U'  consists  of  l 
strings  U 1  for  1  <  i  <  l  where  the  string  U 4  =  b(UtUt+i  .  .  .  Ul+i_1)S(Ul+l...U2l+l_  1).... 
Residues  of  length  at  most  l  —  l  are  ignored  here. 

The  shrink-and-spawn  operation  utilizes  the  naming  function  to  shrink  the  string  V 
by  a  factor  of  /,  spawn  l  copies  of  U  and  maintain  the  following  criteria.  Determining 
all  occurrences  of  the  string  V  in  U  is  equivalent  to  determining  all  occurrences  of  V 
in  U'.  Hence,  shrink-and-spawn  essentially  serves  as  a  “match-preserving”  reduction  of 
a  pattern  matching  problem  to  a  “smaller”  one  since  the  length  of  V'  is  m/l.  For  this 
“match-preserving”  property  it  is  sufficient  that  only  the  substrings  of  length  l  which 
actually  appear  in  both  U  and  V  be  named  using  the  same  function  6.  The  substrings 
in  U  not  found  in  V  can  be  named  using  a  set  of  special  symbols  distinct  from  the  set  of 
special  symbols  used  to  name  the  substrings  in  V  which  are  not  found  in  U. 


3.2  Namestamping 

Consider  a  set  S\  of  distinct  tuples  (x,  y)  where  y  is  called  the  stamp  and  x  is  called  the 
element.  Associate  with  each  distinct  element  x  in  Si,  a  namestamp  denoted  l(x)  which 
is  the  stamp  of  one  of  the  tuples  with  element  x. 

Input:  A  set  S2  of  tuples  (x)  where  x  is  an  element. 

Output:  To  each  tuple  (x)  £  S 2,  the  namestamp  fix). 

Henceforth,  by  the  phrase  “namestamp  set  S2  with  Si”,  we  mean  solving  the  above 
namestamping  problem.  Note  the  similarity  between  namestamping  and  table  lookup.  In 
later  sections,  we  introduce  variants  of  namestamping  that  bring  out  its  similarity  with 
standard  dictionary  operations. 


3.3  Prefix-Naming 

Input:  A  set  S  of  strings. 

Output:  To  each  location  Sfij )  for  some  S4-  £  S,  the  prefix-name  denoted  by  <5(S4-(j)).  For 
each  /,  <5(S4(7))  is  a  naming  function  for  S4(1)S4(2)  •  •  •  S4(/)  for  each  i. 

Note  that  each  distinct  prefix  in  S  is  uniquely  specified  by  its  prefix-name  and  its 
length. 

3.4  Computational  Issues 

In  all  our  applications,  any  integer  or  a  symbol  in  a  string  that  we  consider,  fits  into 
one  PRAM  word.  Namestamping  a  set  S2  with  fA  in  which  the  elements  and  stamps 
are  each  integers  or  tuples  of  integers  (a,  b)  can  be  done  using  standard  techniques 
[KP88,AILSV88,KLP89], 
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Fact  1  Name-stamping  set  S 2  with  Si  can  be  done  in  0(1)  time  using  |5i|  +  IS2I  proces¬ 
sors. 

Prefix-naming  and  hence  naming,  rely  on  namestamping.  Prefix-naming  is  performed 
by  executing  a  standard  prefix-sum  computation  [FL70]  using  the  namestamping  opera¬ 
tion  in  place  of  the  standard  arithmetic  addition  [KP88,KLP89]. 

Fact  2  Given  a  set  of  strings  each  of  length  at  most  m  and  total  size  M ,  deterministic 
prefix-naming  and  hence  deterministic  naming,  can  be  done  in  O(log?7r)  time  and  0(M ) 
work. 


4  Static  Dictionary  Matching  With  Strings 


Formally,  the  static  dictionary  matching  problem  with  strings  is  as  follows.  A  set  of 
distinct  pattern  strings  T>  =  {Pl5  P2, .  .  .  ,  Pa}  called  the  dictionary  is  available  for  pre- 
computations.  The  index  of  the  pattern  Pi  is  i. 


Input:  A  set  T  =  {Tx,  T2, . . .  >  TK }  of  text  strings. 

Output:  For  each  j,  the  index  of  the  longest  pattern  that  matches  at  Tt(j )  denoted  by 
AUj)). 


The  maximum  length  of  any  of  the  pattern  strings  is  m.  The  size  of  the  dictionary 
denoted  by  M,  is  the  sum  of  the  lengths  of  the  individual  pattern  strings.  The  sum  of 
the  lengths  of  the  text  strings  is  denoted  by  n.  We  denote  a  text  location  Tt(j )  by  r  for 
notational  convenience.  Our  algorithm  proceeds  as  described  below. 


Step  1  :  For  each  text  location  r,  determine  (i.)  <h(r),  the  prefix-name  of  the  longest 
prefix  in  the  dictionary  that  matches  at  r,  (ii.)  |^(t)|,  its  length  and  (iii.)  Xv(t ),  the 
index  of  a  pattern  with  this  prefix.  This  problem  is  called  static  prefix-matching.  The 
shrink-and-spawn  operation  is  applied  recursively  to  achieve  this  step,  as  described  in 
Section  4.1  in  detail. 

Step  2:  Given  <h(T),  |<h(T)|  and  Xp(r)  for  each  text  location  r,  determine  X(r),  the  index 
of  the  longest  pattern  that  matches  at  r.  The  pattern  Xj(T)  is  a  prefix  and  the  longest 
such,  of  any  prefix  in  the  dictionary  of  length  |<h(T)l  with  prefix-name  <h(T)- 

We  next  describe  each  step  in  detail.  For  convenience,  assume  both  T  and  T>  are 
presented  simultaneously.  The  easy  modification  to  the  case  when  T>  is  presented  before 
T,  is  explained  in  Section  4.3. 


4.1  Static  Prefix-Matching  (Step  1) 

Assume  prefix-naming  has  been  performed  on  T>.  With  each  location  j  in  Pi  £  X,  we 
have  its  prefix-name  b(Pi(j)).  Prefix-matching  is  performed  in  two  phases. 
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Phase  1.  For  each  text  location  r,  determine  ^(r),  the  prefix-name  of  the  longest  prefix 
in  the  dictionary  that  matches  at  r  and  |^(r)|,  its  length. 

Phase  2.  Given  St(T)  and  |^(r)|  for  each  text  location  r,  determine  Zp(r),  the  index 
of  a  pattern  in  'P  that  has  this  prefix.  This  is  called  the  retrieve-index  problem. 

We  now  describe  Phase  1  in  detail.  Phase  2  is  performed  easily  using  a  namestamping 
operation. 

Algorithm  Description  (Phase  1) 

Let  L  be  a  parameter  to  be  fixed  later.  Our  algorithm  for  Phase  1  of  static  prefix- 
matching  has  three  steps: 

1.  Shrink-and-spawn  Step.  Shrink  each  string  in  T>  by  a  factor  of  L  and  spawn  L 
copies  of  each  string  in  T.  The  resultant  set  of  text  and  pattern  strings  are  T'  and  P' 
respectively. 

2.  Recursive  Step.  Recursively  solve  Phase  1  of  static  prefix-matching  on  P'  and  T' . 

For  each  location  T/(j)  in  T'  denoted  by  r',  the  output  is  the  prefix-name  of  the  longest 
prefix  of  any  of  the  strings  in  P'  that  matches  at  r',  say  and  |<^(t')|,  its  length. 

Equivalently,  for  each  text  location  Tt(j )  denoted  by  r,  the  output  is  the  prefix-name  of 
the  longest  prefix  of  any  of  the  strings  in  P'  that  matches  at  r,  say  a(r),  and  |a(r)|,  its 
length. 

3.  Extend- Right  Step.  For  each  text  location  Tt(j )  denoted  by  r,  given  a(r),  the 
prefix-name  of  the  longest  prefix  of  any  of  the  strings  in  P'  that  matches  at  r  and  its 
length  |ch(t)|,  determine  6t(r),  the  prefix-name  of  the  longest  prefix  of  any  of  the  strings 
in  T>  that  matches  at  r  and  its  length  |<5i(r)|. 

Note  that  Step  1  requires  naming  a  set  of  strings  of  length  L.  In  Step  2,  prefix-names 
for  the  strings  in  P'  are  required  to  solve  the  prefix-matching  problem  recursively.  These 
can  be  found  from  the  prefix-names  for  the  strings  in  £>,  since  each  string  in  P'  is  a  prefix 
of  a  string  in  T>. 

We  now  consider  Step  3  in  some  detail.  As  before,  we  denote  a  text  location  Tt(j ) 
by  r  for  notational  convenience.  The  longest  prefix  from  P'  that  matches  at  r,  a(r), 
corresponds  to  a  prefix  in  T>.  Let  this  prefix  be  j3{r).  Its  length  |/3(r)|  =  T|a(r)|.  By  the 
guarantee  in  our  recursive  step,  no  prefix  in  P'  of  length  |a(r)|  +  1  matches  at  r.  This 
implies  that  no  prefix  in  T>  of  length  _L|a(r)|  +  L  matches  at  r.  Hence,  the  prefix  6t(r)  is 
no  more  than  L  —  1  longer  than  (3(t).  The  task  in  Step  3  is  to  extend  /3(t)  in  T>  to  obtain 
dt{r).  Let  |^(r)|  —  |/3(t)|  be  the  extension  length.  To  determine  A(r),  check  for  each 
possible  extension  length  T,  if  there  exists  a  prefix  in  T>  of  length  |/3(r)|  +  C  that  matches 
at  r  as  described  below.  Clearly,  |^(r)|  is  the  largest  C  for  which  there  exists  a  prefix 
in  T>  of  length  |/3(t)|  +  £  that  matches  at  r,  but  no  prefix  of  T>  of  length  |/3(t”) |  +  £  +  1 
matches  at  r.  Correspondingly  <h(r)  is  obtained. 

It  now  remains  to  show  for  each  possible  extension  length  T,  how  we  check  if  there 
exists  a  prefix  in  T>  of  length  |/3(r)|  +  £  that  matches  at  r.  Consider  the  following  incre- 
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mental  extension  step:  given  the  prefix-name  of  the  prefix  from  'D  of  length  |/3(r)|  +  £ 
that  matches  r,  determine  the  prefix-name  of  the  prefix  from  D  of  length  |/3(r)|  +  £  +  1, 
that  matches  at  r.  Clearly,  this  incremental  extension  step  can  be  used  to  check  all 
possible  extension  lengths. 

Informally,  incremental  extension  works  as  follows:  each  prefix  in  the  dictionary  marks 
a  table  at  a  location  indexed  by  its  prefix-name.  Each  text  location  generates  the  prefix- 
name  of  the  prefix  of  the  desirable  length  and  checks  the  corresponding  table  location  to 
determine  if  there  exists  a  prefix  in  T>  with  that  prefix-name.  This  procedure  is  imple¬ 
mented  using  the  namestamping  operation  as  described  below. 

Let  V(£(j ))  denote  the  prefix-name  of  the  prefix  in  V  of  length  \/3(£(j))\  +  £  that 
matches  Tt  at  j .  The  goal  in  incremental  extension  is  to  determine  the  prefix-name  of  the 
prefix  of  V  of  length  \j3(£(j))\  +  £  +  1  that  matches  £  at  j,  if  any.  For  each  location 
j  in  each  text  string  £.  generate  a  tuple 

(v(£(j)),£(j  +  \mm+ £+v)- 

Consider  the  set  St  of  all  these  tuples.  Partition  this  set  into  sets  St( A)  such  that  all 
those  tuples  with  \/3(£(j))\  +  £  +  1  =  A  belong  to  St( A).  From  each  position  r  in  each  of 
the  pattern  strings  Pj  such  that  r  —  (£  +  1)  mod  T,  generate  a  tuple 

Consider  the  set  Sp  of  all  such  tuples.  Partition  this  set  into  sets  Sp( A)  such  that  all 
pattern  tuples  with  r  =  X  belong  to  Sp( A).  Namestamp  set  St( A)  with  set  Sp( A)  for  each 
A.  We  claim  the  following:  the  stamp  for  the  tuple  corresponding  to  a  text  location  £(j ) 
is  the  prefix-name  of  a  prefix  from  P  of  length  \/3(£(j))\  +  £  -f  1  that  matches  £  at  j 
if  one  such  existed,  and  is  0  otherwise. 

Implementation  and  Complexity. 

Let  T(n,M,  m)  and  W(n,  M,  m)  denote  respectively,  the  time  and  work  complexity 
of  this  algorithm  when  the  text  strings  have  total  size  n,  the  pattern  strings  have  total 
size  M  and  the  length  of  the  longest  pattern  string  is  m.  The  Extend-Right  step  takes 
0(L)  time  and  0{nL  +  M)  work.  To  sum, 

T(n,  M,  m)  <  log  L  +  T(n,  MjL ,  m/L)  +  L 
W ( n ,  M,  m)  <  riL  +  M  -)-  W(n,  M/L ,  m/L )  +  nL  +  M. 

Setting  L  =  2  and  solving, 

Theorem  1  Phase  1  of  prefix-matching  for  a  text  of  size  n,  a  set  of  patterns  each  of 
length  at  most  m  and  total  size  M  can  be  solved  in  time  O(logm)  and  work  0(M  + 
n  log  m). 
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4.2  Finding  Longest  Pattern  (Step  2) 

For  each  prefix  in  'P,  we  show  how  to  determine  its  longest  prefix  that  is  a  pattern.  Fol¬ 
lowing  this,  Z(r)  for  any  text  location  r  can  be  looked  up  from  the  prefix  of  length  |6*(t)| 
and  prefix-name  St{T)  computed  in  Step  1,  using  namestamping. 

1.  Determine  for  each  location  Pt(j )  if  the  prefix  P;(l)  •  •  •  Pt(j )  is  a  pattern.  This  is  done 
using  namestamping.  The  output  is  an  auxiliary  array  A  of  size  M:  corresponding  to 
each  Pfij ).,  there  is  a  position  in  A  that  is  set  to  1  if  P;(l)  •  •  •  Pt(j )  is  a  pattern  and  to  0 
otherwise. 

2.  For  each  position  Pfij),  determine  the  largest  k  <  j  such  that  P2(l)  •  •  •  Pi{k)  is  a 
pattern.  This  is  equivalent  to  finding,  for  each  position  in  A ,  the  nearest  1  to  its  left. 

Theorem  2  For  each  location  Pi(j)  in  T>,  its  longest  prefix  that  is  a  pattern  can  be 
computed  in  O(logm)  time  and  0(M )  operations. 


4.3  Putting  It  Together — Static  Dictionary  Matching 

Recall  that  T  and  T>  were  presented  simultaneously.  The  algorithm  is  slightly  modified 
if  T>  is  made  available  for  preprocessing.  Process  the  pattern  strings  by  simulating  their 
role  in  the  algorithm  described  earlier.  In  a  manner  similar  to  [AILSV88,KLP89],  store 
various  tables  used  in  the  individual  steps.  Subsequently  when  T  is  presented,  the  text 
strings  are  processed  by  simulating  the  algorithm  using  these  appropriate  tables. 

Theorem  3  A  set  of  patterns  T>  each  of  maximum  length  m  with  total  size  M  is  pro¬ 
cessed  in  O(logm)  time  and  0(M )  work.  For  each  location  in  a  text  of  length  n.  the 
longest  pattern  that  matches  at  that  location  can  be  determined  in  O(logm)  time  and 
0(n  log  m)  work. 

From  our  description  thus  far,  it  easily  follows  that  while  preserving  other  bounds, 
text  processing  can  be  performed  with  0(n  log  A)  work  where  A  =  /max  —  lm in  and 
(=  m)  and  lm  in  denote  the  length  of  the  longest  and  the  shortest  pattern  respectively,  in 

V. 


4.4  More  Efficient  Dictionary  Matching  with  a  Small  Alphabet 
Size 

Our  algorithm  in  the  previous  section  is  optimal  in  the  size  of  the  dictionary,  but  is  sub- 
optimal  in  the  text  size.  Intuitively,  an  approach  towards  achieving  work  optimality  in 
the  text  size  in  Phase  1  of  Step  1  (See  Section  4.1),  is  to  “drop’7  some  text  locations  as 
the  recursion  (Step  2)  progresses.  The  difficulty  in  doing  this  lies  in  inferring  the  longest 
prefix  that  matches  beginning  at  each  “dropped”  position  given  those  at  the  remaining 
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Figure  1:  Longest  Prefixes  at  Neighborhood 


positions.  For  example,  consider  the  scenario  in  Figure  1.  For  convenience,  we  refer  to 
locations  j  —  1,  j,  j  +  1  in  text  Tl  by  r  —  1,  r  and  r  +  1  respectively.  Assume  that  the 
longest  prefixes  from  the  dictionary  that  matches  at  text  locations  r  —  1  and  r  +  1  are 
known  and  from  these,  the  longest  prefix  at  r  need  to  be  inferred.  Clearly,  the  longest 
prefix  at  r  is  arbitrarily  long/short  relative  to  the  longest  prefixes  at  r  —  1  and  r  +  1. 

As  it  turns  out,  we  can  relate  the  longest  prefixes  at  neighboring  positions,  by  defin¬ 
ing  the  prefixes  of  the  patterns  carefully.  Consider  a  dictionary  P.  Let  Ps  be  the  set  of 
strings  derived  from  P  by  replacing  each  string  in  it  by  its  suffix  obtained  by  deleting  the 
leading  symbol.  Define  V=  P\JPS.  Let  ip(r  +  1)  denote  the  longest  prefix  from  V  that 
matches  at  r  +  1.  Let  <f>(r)  denote  the  longest  prefix  from  P  that  matches  at  r.  We  claim 
that  (j) (r)  is  the  longest  prefix  of  the  string  obtained  by  concatenating  ip(r  +  1)  to  Tt(j  J 
that  is  also  a  prefix  in  P.  Using  this  observation,  we  can  efficiently  compute  (p  values 
from  the  ip  values  as  follows.  Let  t  be  a  string  of  the  form  a||l?  ( a  concatenated  with  B ) 
where  a  is  a  text  symbol  and  B  is  a  prefix  in  V .  Our  task  of  computing  <p  values  from  ip 
values  is  essentially  that  of  determining  the  longest  prefix  of  a  string  of  the  form  t  that 
is  also  a  prefix  in  P.  We  accomplish  the  following  which  is  clearly  sufficient  for  this  task: 
for  every  string  of  the  form  f,  we  determine  its  longest  prefix  that  is  also  a  prefix  in  P. 
We  accomplish  this  by  considering  a  set  of  size  |S|  X  \V\  which  contains  all  prefixes  of 
the  form  t  and  performing  a  computation  similar  to  that  described  in  Step  2  in  Section 
4.2.  Note  that  this  computation  is  alphabet-dependent.  By  systematically  utilizing  these 
ideas  with  a  variant  of  the  shrink-and-spawn  technique,  we  derive  an  algorithm  for  static 
dictionary  matching  which  we  now  describe. 

Our  algorithm,  modified  from  that  in  Section  4,  is  more  efficient  when  the  alphabet  set 
S  from  which  the  strings  are  drawn  is  small.  Our  overall  approach  is  to  try  and  collapse 
the  text  initially  to  length  n/ L  for  some  parameter  L  thereby  retaining  only  a  fraction  of 
the  text  positions.  Subsequently  we  match  this  shrunken  text  on  a  suitable  dictionary. 
From  the  output,  we  construct  the  solution  for  each  of  the  original  text  positions.  Again 
for  convenience,  we  consider  that  T>  and  T  are  provided  simultaneously.  We  proceed  as 
described  below. 
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Step  1.  (Modified  Shrink-and-Spawn)  Consider  L  copies  of  each  pattern  string  in  P 
obtained  by  successively  dropping  the  leading  symbol.  Let  resultant  set  be  V.  Shrink 
strings  in  T  and  V  by  L  to  obtain  text  and  pattern  sets  T'  and  P'  respectively. 

Step  2.  (Dictionary  Matching)  Solve  static  dictionary  matching  algorithm  on  T'  and  P' 
using  the  algorithm  in  Section  4.  Output  for  each  text  position  Tt(j )  such  that  j  =  kL-\- 1 
for  some  integer  k ,  is  the  longest  prefix  from  P'  of  length  a  multiple  of  L  that  matches 
beginning  at 

Step  3.  (Extend- Right)  For  each  Tt(j)  such  that  j  =  kL  +  1  for  some  integer  k ,  the 
longest  prefix  from  P'  of  length  a  multiple  of  L  is  extended  by  at  most  L  —  1  positions  to 
the  right  as  in  Step  3  in  Section  4.1  to  obtain  the  longest  prefix  from  V  that  matches  at 
From  this,  determine  the  longest  pattern  that  matches  at  T{;(j)  as  in  Section  4.2. 

Step  4.  (Extend-Left)  Given  the  longest  prefix  from  V  that  matches  at  Tt(j )  where 
j  =  kL  +  1  for  some  ft,  extend  this  left  and  determine  the  longest  prefix  from  V  that 
matches  beginning  at  Tffi  —  £)  for  1  <  C  <  L  —  1.  From  this,  determine  the  longest 
pattern  that  matches  at  Tffi  —  £). 

That  completes  the  description  of  our  algorithm  at  the  high  level.  Consider  each  step 
in  some  detail.  The  definition  of  set  V  is  critical  in  being  able  to  perform  the  Extend- 
Left  efficiently.  P-  obtained  from  Pt  £  P  by  dropping  the  first  j  positions,  is  called  the 
j -suffix  of  P{,.  A  <k-suffix  of  Pi  refers  to  any  P-  for  j  <  k.  Hence,  V  is  the  collection  of 
the  <T-suffixes  of  the  pattern  strings  in  P.  The  first  three  steps  follow  from  Section  4. 
We  now  consider  the  Extend-Left  step.  For  each  Tt(j )  such  that  j  =  kL  +  1  for  some 
integer  k ,  the  longest  prefix  of  <T-suffixes  of  P  that  match  beginning  at  Tt(j )  is  given  as 
input  to  the  Extend-Left  step.  Let  this  be  ?/>(T4(j)).  Without  loss  of  generality,  consider 
a  fixed  window  Tt(j  —  L)  •  •  •  Tt(j )  henceforth  for  discussions.  Extend-Left  is  done  in  two 
steps. 

Step  A.  For  each  location  Tt(j  —  C)  compute  a(£)  defined  iteratively  as  follows:  o(0)  = 
xj){Ti{j))  and  a(£  +  1)  is  the  longest  prefix  of  T(j  —  C  —  l)]|a(£)  in  V.  Here  the  symbol 
||  stands  for  string  concatenation.  The  value  a(L)  satisfies  the  following  property:  the 
longest  prefix  of  a(C)  which  is  a  prefix  in  the  <(T  —  E)-suffixes  of  P  is  the  longest  prefix 
of  the  <(T  —  E)-suffixes  of  P  that  matches  at  j  —  C. 

Step  B.  For  each  location  Tt(j  —  £),  determine  the  longest  pattern  that  is  the  prefix  of 
a(C).  Since  the  longest  prefix  of  <(T  —  £)-suffixes  of  P  that  matches  at  j  —  C  is  a  prefix 
of  «(£),  the  longest  pattern  in  particular  that  matches  at  Tt(j  —  C )  is  a  prefix  of  a(C). 

That  completes  our  description  of  the  Extend-Left  step.  Step  B  follows  easily  from 
Section  4.2.  It  remains  to  demonstrate  how  a(£)  is  computed  in  Step  A.  Consider  set 
P"  obtained  from  each  p  £  V  by  replacing  it  with  a\\p  for  each  a  £  E.  For  each  prefix 
in  P",  compute  into  a  table,  the  longest  prefix  from  V  as  in  Section  4.2.  Now,  string 
T(j  —  £  —  l)||a(£)  is  a  prefix  in  P"  for  any  T(j  —  £  —  1).  Thus,  a(£)  is  looked  up  from 
this  table  using  namestamping. 
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That  completes  the  entire  description  of  our  algorithm.  Step  1  takes  O(log  L)  time 
and  0(n  +  ML)  work  to  perform  the  naming.  The  sets  T'  and  P'  are  of  sizes  n/L 
and  M  respectively.  The  length  of  the  longest  pattern  in  P'  is  m.  From  Theorem  1, 
step  2  takes  O(log?77)  time  and  O(nlogm/ L  +  M)  work.  Step  3  takes  0(L)  time  and 
0(ML  +  |  X  i)  =  0{n  +  ML)  work.  The  one  time  computation  of  all  a  values  in  Step  4 
takes  0(log?7r)  time  and  0(ML |E|)  work.  This  is  the  alphabet-dependent  computation 
in  the  entire  algorithm.  Rest  of  the  Step  4  takes  0(L)  time  and  0(n)  work.  Hence, 

Theorem  4  A  dictionary  of  size  M  and  longest  pattern  of  length  at  most  m  is  processed 
in  O(logm)  time  and  0(M\L\L)  work.  A  text  of  size  n  can  he  matched  against  this 
dictionary  in  0(L  -\-logm)  time  and  O(n\ogm/ L)  work  for  any  L  <  logm. 

Corollary  1  Let  |S|  =  o(log?n).  A  static  dictionary  can  he  processed  in  0( log  777)  time 
and  O^MyJ  log?77|S|)  work.  The  text  of  size  n  can  be  matched  against  this  dictionary  in 
O(logm)  time  and  0(n\J  logm|E|)  work. 

Corollary  2  For  any  |S,  static  dictionary  matching  can  be  performed  in  O(logm)  time. 
Dictionary  processing  involves  0(M|E|  log  777)  work  and  text  processing  involves  0(n ) 
work. 

A  slightly  more  general  theorem  can  shown  as  follows.  For  the  purposes  of  decreasing 
the  alphabet-dependent  complexity,  consider  the  Extend-Left  step.  Encode  each  symbol 
in  the  dictionary  using  distinct  binary  code  of  length  log|E|.  The  resultant  dictionary 
is  of  size  Mlog|E|.  Perform  operations  as  described  earlier  to  move  left  by  one  bit. 
This  takes  log  m  +  log  log  | E  |  time  and  0(ML  log  |S|)  work.  Correspondingly,  to  perform 
Extend-Left  while  text  processing,  consider  the  text  symbols  replaced  by  their  binary 
codes  of  length  log  |E|.  As  before,  text  processing  involves  moving  to  the  left  by  L  posi¬ 
tions.  For  each  of  these  positions,  move  left  log  | S |  times,  one  bit  at  a  time.  It  follows 
that, 

Theorem  5  A  dictionary  of  size  M  and  longest  pattern  of  length  at  most  m  is  processed 
in  O(logm  +  loglog  |E|)  time  and  0(MTlog|E|)  work.  A  text  of  size  n  can  be  matched 
against  this  dictionary  in  0(L  log  |E|  +  log?7r)  time  and  0(?7  log  m/L  +  n  log  |E|)  work  for 
any  positive  integer  L. 

Assume  | E |  <  m  (stronger  than  the  Corollary  1).  Set  L  =  log  777/ log  |S|.  The  time 
for  static  dictionary  matching  is  O(log?77).  The  work  becomes  0(nlog|E|  +  M  log  rn). 
Thus  text  processing  involves  0(77  log  m)  work  while  dictionary  processing  involves  only 
0(M  logm)  work. 
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5  Higher  Dimensional  Dictionary  Matching 


We  now  discuss  the  two  dimensional  dictionary  matching  problem.  Extensions  to  cl- 
dimensional  dictionary  matching  for  a  fixed  d  are  straightforward.  Consider  the  static 
two  dimensional  dictionary  matching  problem.  We  have  a  set  T>  called  the  dictionary  of 
pattern  arrays  {Pl5  P2,  •  •  •  ,  Pa}  where  each  P,;  is  a  pt  X  pt  square,  given  for  preprocessing. 

Input:  The  set  T  of  text  arrays  {T1;  X2, .  .  .  ,  TK }  where  each  Tt  is  a  ti  X  tt  square. 
Output:  For  each  location  (i.j )  in  £  T,  the  index  of  the  pattern  with  largest  sides 
that  matches  at  Tfii.j  j. 

A  match  between  two  dimensional  arrays  is  a  standard  notion  [B77,Ba78].  Let 
J2p,evPl  =  M  and  Hr,er^42  =  n-  Furthermore,  each  pt  <  m.  Define  a  square-prefix 
of  an  array  to  be  a  subsquare  at  the  top  left  corner.  Also,  prefix-names  for  squares 
are  defined  analogous  to  prefix-names  for  strings.  We  extend  our  algorithm  for  static 
dictionary  matching  with  strings  (from  Section  4)  to  the  case  of  square  arrays.  We  first 
describe  some  details  of  Step  1,  namely,  “two  dimensional  prefix-matching”  defined  below. 
The  input  to  two  dimensional  prefix-matching  is  the  same  as  that  for  two  dimensional 
dictionary  matching.  However,  the  output  is  as  follows: 

Output:  For  each  text  location  T^(i,  j)  denoted  by  r,  compute  <^(t),  |<^(t)|  and  X(r) 
such  that  the  square-prefix  of  largest  sides  from  P  that  matches  at  r  is  a  |^(r)|  X  |^(r)| 
square-prefix  with  prefix-name  St(r),  and  X(r)  is  the  index  of  a  pattern  with  this  prefix. 

Suppose  that  each  location  Pfii.j )  in  Px  £  P  is  assigned  its  prefix-name.  Formally, 
prefix-naming  a  set  of  strings  S  is  defined  as  follows.  Consider  the  set  S(i:j )  of  all  the 
subarrays  5^(1  •  •  ■  *,  1  •  •  •  j)  for  every  string  Sk  £  S.  Each  of  the  sets  S(i,j)  is  individually 
named.  This  name  is  called  the  prefix-name.  Note  that  we  are  implicitly  using  a  more 
general  notion  of  prefix  of  an  array  being  a  rectangular  subarray  with  the  same  left  hand 
corner. 

While  solving  the  two  dimensional  prefix-matching  problem,  largest  square-prefix  that 
matches  at  a  text  location  is  specified  by  its  prefix-name  and  dimensions.  For  the  pattern 
arrays,  it  is  guaranteed  that  the  prefix-names  of  all  the  square-prefixes  would  have  been 
determined.  This  is  done  recursively. 

Step  1.  (T  wo  Dimensional  Shrink-and-Spawn)  Consider  two  sets  Pr  and  Pc  where  Pr  is 
obtained  by  stripping  the  top  row  in  each  pattern  in  P  and  Pc  is  obtained  by  stripping 
the  leftmost  column  in  each  pattern  in  P.  Let  P'  =  P  U  PT  U  Pc.  Note  that  |P'|  <  3 M. 
Consider  a  naming  of  all  subarrays  of  T  and  P'  of  size  2x2.  Let  this  naming  function 
be  S'. 

We  shrink  the  patterns  in  the  following  sense.  Consider  the  set  of  all  patterns 
P"  =  {  P( ,  P2 ,  •  •  •  ,  P^  }  where  P(  is  obtained  from  £  P'  as  follows.  Consider  each 
location  (*,  j)  such  that  i  and  j  are  odd.  Replace  P).(i  •••(?’  +  1),  j  ■  ■  ■  (j  +  1))  by  its 
name  S'(Pk(i  •  •  •  (i  +  1),  j  •  •  •  (j  +  1))).  That  is,  P/  is  generated  from  P,-  by  shrinking  each 
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disjoint  subarray  of  size  2x2  into  a  single  symbol  using  names.  Note  that  there  could 
be  a  residue  consisting  of  one  row  and  one  column  of  some  of  the  pattern  arrays.  These 
residues  would  be  considered  later. 

We  spawn  copies  of  text  arrays  in  the  following  sense.  Consider  the  set  of  all  text 
arrays  T'  such  that  each  text  array  Tt  is  replaced  by  4  text  arrays  T-  for  1  <  J  <  4. 
Each  of  these  arrays  is  obtained  from  an  appropriate  array  derived  from  Tt  by  first  tiling 
it  with  subarrays  of  size  2x2  and  subsequently  replacing  each  subarray  with  its  name 
S'.  The  four  arrays  on  which  these  two  operations  are  performed  so  as  to  yield  T-  for 
1  <  j  <  4  are  1.  2)(1  •  •  •  C,  1  •  •  •  fj),  2.  2)(2  •  •  •  C,  1  •  •  •  f4),  3.  21(1  •  •  •  C,  2  •  •  •  t4),  and  4. 
21(2  •  •  •  fj,  2  •  •  •  t{).  That  is,  each  text  array  spawns  off  4  copies  by  replacing  each  subarray 
of  size  2  X  2  in  the  original  text  array  by  a  single  symbol. 

Step  2.  (Computations  for  Extend- Right)  We  perform  the  following  two  operations  on 
P'. 

Step  2a.  Prefix-name  the  set  P' . 

Prefix-name  for  P'  is  computed  in  two  steps.  First,  compute  the  prefix-names  for  the 
set  of  strings  obtained  by  considering  each  row  of  each  of  the  arrays  in  P'  as  a  string. 
Each  location  (*,  j)  in  each  array  P j,  is  associated  with  the  prefix-name  8i{Pk{i,  j)).  This 
is  the  name  for  the  prefix  Pk(i,  1)  •  •  •  Pk(i:j).  Consider  an  auxiliary  array  for  each  array 
Pk  G  P'  denoted  by  P'k"  where  P'k"(i.j)  =  81(Pk(i.  j  )).  Let  P'"  be  the  collection  of  these 
auxiliary  arrays.  Consider  each  column  of  these  arrays  as  strings  and  perform  another 
prefix-naming.  Each  location  (i,j)  in  each  auxiliary  array  Pf  is  associated  with  the 
prefix-name  Si(Pk"(i.j)).  This  is  the  name  for  the  prefix  Pk"(i,  1)  •  •  •  Pk"(i,  j). 

Lemma  1  The  function  Si  assigns  valid  prefix-names  for  the  arrays  in  P' . 

Proof:  Consider  two  rectangles,  that  is  the  (1  •  ■  ■  *,  1  ■  ■  ■  j)  entries  in  two  pattern  ar¬ 
rays  A  and  B.  Their  auxiliary  arrays  are  A'  and  B'  respectively.  The  claim  is  that 
Si(A'(i,j))  =  8i(B'(i^jf)  if  and  only  if  A(1  •  •  •  z,  1  •  •  •  j)  =  P(1  •  •  ■  *,  1  •  •  •  j).  Assume 
8i(A'(i,j))  =  8i(B\i,j)).  This  implies  A'{1  =  B'(l  •  •  •  *,  j)  since  Si  is  a  prefix 

naming  for  the  columns  of  P' .  This  in  turn  implies  that  Si(A(k,  j))  —  8i(B(k,  j))  for  all 
k  G  1  •  •  •  i.  Thus  A(k.  1  •  •  •  j)  —  B(k,  1  •  •  •  j)  for  all  k  G  1  •  •  •  i  since  S\  is  a  prefix-name  for 
the  rows  of  P.  Hence,  A(1  •  ■  ■  z,  1  •  ■  ■  j)  =  B(1  •••?’,  1  •  •  •  j).  The  other  direction  is  seen 
similarly.  □ 

Step  2b.  For  each  square-prefix  in  P' .  determine  its  longest  square-prefix  that  is  a 
square-prefix  in  P.  This  is  done  as  described  in  Section  4.2  corresponding  to  Step  2 
in  the  static  dictionary  matching  algorithm.  Similarly,  determine  for  each  square-prefix 
in  P',  its  longest  square-prefix  that  is  a  square-prefix  in  Pr .  Also  determine  for  each 
square-prefix  in  P',  its  longest  square-prefix  that  is  a  square-prefix  in  Pc. 

Step  3.  (Recursive  Step)  Recursively  determine,  for  each  text  location  r  in  T .  the 
square-prefix  of  longest  dimensions  from  P"  that  matches  at  r.  This  additionally  returns 
the  prefix-names  for  the  square-prefixes  of  P" . 
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Step  4.  (Extend  Right)  This  is  the  most  complicated  step  of  all. 

Step  4a.  Generate  the  prefix-names  for  the  square-prefixes  of  P.  Note  that  the  square- 
prefixes  of  even  dimensions  are  retained  as  square-prefixes  in  P" .  Therefore,  it  remains 
to  determine  the  prefix-names  for  the  square-prefixes  of  odd  dimensions.  Consider  all 
square-prefixes  of  dimension  (2 *  +  1)  X  (2 *  +  1)  in  parallel.  Consider  in  particular  the 
square-prefix  Pk(  1  ■  ■  •  (2 *  +  1),  1  •  •  •  (2 *  +  1))  for  some  integer  *. 

Note  that  Pk(  1  •  •  •  2?’,  1  •  •  •  2 *),  Pk(  1  •  •  •  2*,  2  •  •  •  (2*  +  l))  and  Pk( 2  •  •  •  (2*  +  l),  1  •  •  •  2*)  are 
square-prefixes  in  P,  Pc  and  Pr  respectively;  furthermore,  they  are  all  of  even  dimensions. 
Hence,  their  prefix-names  have  been  determined  recursively.  Denote  their  prefix-names 
by  r*e,  nc  and  nr  respectively.  The  square-prefixes  of  dimension  (2 *  +  1)  X  (2 *  +  1)  are 
assigned  prefix-names  by  namestamping  with  the  tuple 

((ne,nr,nc,Pk( 2*  +  1,2*  +  1)),  (k)) 

generated  one  per  each  square-prefix  Pk(  1  •  •  •  (2*  +  1),  1  •  •  •  (2*  +  1)). 

Step  4b.  For  each  text  location  r,  the  longest  square-prefix  from  P"  that  matches  at 
r  is  provided.  This  is  a  square-prefix  in  P' .  Let  this  prefix  be  a(r).  Note  that  this  is  a 
2 *  X  2 *  square,  for  some  *.  There  are  two  cases. 

1.  If  a(r)  is  not  a  square-prefix  in  any  pattern  in  P,  then  the  largest  square  prefix 
from  P  is  the  largest  square  prefix  of  a(r)  that  is  a  square  prefix  in  P. 

2.  Say  a(r)  is  a  square-prefix  of  some  pattern  in  P.  Then,  one  of  the  following  is 
true.  Either  a(r)  is  the  largest  square-prefix  from  P  that  occurs  at  r  or  the  largest 
square-prefix  in  P  that  occurs  at  r  has  sides  of  length  (2 *  +  1)  X  (2*  +  1). 

Case  1  above  is  taken  care  of  as  in  Section  4.2.  We  show  how  to  take  care  of  the  Case 
2.  The  first  task  is  to  check  if  a  (2*  +  1)  X  (2*  +  1)  square-prefix  of  any  pattern  occurs 
at  t.  If  none  such  occurs,  the  longest  square-prefix  that  matches  at  r  is  indeed  a(r).  If 
such  a  square-prefix  exists,  then  the  second  task  is  to  determine  the  prefix-name  of  this 
square-prefix.  Both  these  tasks  are  accomplished  using  namestamping  that  we  describe 
below. 

Generate  a  set  Sp  of  tuples  from  P.  The  set  Sp(i)  consists  of  tuples  in  Sp  generated 
by  the  pattern  positions  (2 *  +  1,2*  +  1)  in  some  pattern.  Consider  one  such  position 
Pk(2i  +  1,  2*  +  1).  This  location  generates  the  following  tuple 

((ne,  nr.  nc,  Pfc(2*  +  1,2*  +  1)},  (k)) 

where  ne,  nc  and  nr  are  the  prefix-names  of  the  squares  of  size  2*  X  2*,  respectively  given 
by,  Pfc(l  •  •  •  2*,  1  •  •  •  2*),  Pk(  1  •  •  •  2*,  2  •  •  •  (2*  +  1))  and  Pk( 2  •  •  •  (2*  +  1),  1  •  •  •  2*).  These  are 
square-prefixes  in  P'  and  their  prefix-names  have  been  ascertained  recursively. 
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Generate  the  set  St  of  tuples  from  the  text.  The  set  St(i)  consists  of  tuples  generated 
by  those  text  locations  r  for  which  gk(t)  is  a  2 i  X  2 i  square.  Consider  one  such  text 
location  r  =  T(j,k).  Let  its  neighbor  immediately  to  the  right  be  rc  and  the  neighbor 
immediately  below  be  rr.  Let  the  longest  square-prefixes  of  P'  that  match  at  rr  and  rc 
be  ck(tt)  and  oi(tc)  respectively.  We  determine  if  a  (2 i  +  1)  X  (2 i  +  1)  square-prefix  of  any 
pattern  occurs  at  r.  The  text  location  r  generates  the  following  tuple 

«XX;T(i  +  2i,k  +  2i)) 

where  n'e.  n'r  and  n'c  are  the  prefix  names  of  the  2 i  X  2 i  square-prefix  of  the  pattern,  if  any, 
which  occur  at  r,  tt  and  rc,  respectively.  Note  that  n'e  corresponds  to  the  prefix-name 
of  a(r).  We  show  how  to  obtain  n[ j  n'r  is  similarly  obtained. 

Consider  gk(tc).  If  ch(tc)  were  a  square  of  size  (2 i  —  2)  X  (2 i  —  2)  or  smaller,  then  there 
does  not  exist  any  (2 i  +  1)  X  (2 i  +  1)  square-prefix  of  P  that  occurs  at  r.  This  is  because, 
if  ck(tc)  were  a  square  of  size  (2 i  —  2)  X  (2 i  —  2)  or  smaller,  then  the  longest  prefix  from  P 
can  be  a  square  of  at  most  (2 i  —  1)  X  (2 i  —  1)  in  size.  Note  that  ch(tc)  can  not  be  of  size 
(2 i  —  1)  X  (2 i  —  1)  since  it  is  an  even  sided  square.  If  ck(tc)  were  of  size  2 i  X  2 i  or  larger, 
then  determine  its  largest  square-prefix  which  is  a  square-prefix  in  Pc.  This  is  sufficient 
to  determine  n'c.  if  any. 

Having  generated  the  text  and  pattern  tuples,  namestamp  the  set  St(i)  with  the  set 
Sp(i)  for  each  i  in  parallel.  This  provides  each  text  location  r  with  the  prefix-name  of 
the  square-prefix  of  size  (2 i  +  1)  X  (2 i  +  1),  if  any  existed  in  P. 

That  completes  our  description  of  the  recursive  shrink-and-spawn  steps  for  two  di¬ 
mensional  prefix  matching.  For  each  location  T^(«,  j),  given  the  pattern  square-prefix  of 
largest  dimensions  that  matches  at  that  location,  it  remains  to  extract  the  pattern  array 
of  largest  dimensions  that  matches  beginning  at  that  location.  This  is  performed  as  in 
the  second  phase  described  in  Section  4.2.  The  only  difference  is  that  we  look  at  the 
diagonal  for  determining  the  longest  pattern  that  is  the  prefix  of  the  given  match  at  a 
text  location. 

We  consider  the  complexity  next.  This  brings  up  a  detail  concerning  Steps  2a  and 
2b.  This  step  is  not  performed  at  each  recursive  level.  If  it  were,  each  level  of  recursion 
would  take  O(log?ri)  time,  and  the  entire  algorithm  would  then  take  O(log2  m)  time. 
Rather,  we  wait  for  the  recursion  to  reach  its  lower  most  level  and  then  in  parallel  this 
step  is  executed  for  each  recursion  level.  Following  this,  “unwinding”  of  the  recursive 
levels  continues.  Hence,  over  all  recursive  levels,  these  two  steps  take  O(log??i)  time  and 
0(M)  work.  The  naming  in  Step  1  takes  0(1)  time  and  0{n  +  M)  work.  As  a  result, 
the  text  size  remains  as  n  while  the  dictionary  size  falls  to  3M/4.  The  dimension  of  the 
largest  pattern  in  P'  is  m/2  X  m/2.  Step  4  takes  0(1)  time  and  0{n  +  M)  work.  Thus 
we  claim, 

Theorem  6  A  dictionary  of  two  dimensional  patterns  of  total  size  M  such  that  the  max¬ 
imum  size  of  any  pattern  is  m  X  rn.  can  be  preprocessed  in  0(log  m)  time  and  0(M )  work. 
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Furthermore ,  it  can  be  matched  against  a  set  of  text  arrays  of  total  size  n  in  O(logm) 
time  and  O(nlogm)  work. 


6  Dynamic  Dictionary  Matching 

6.1  Partly  Dynamic  Dictionary  Matching 

In  partly  dynamic  dictionary  matching,  we  have  an  initial  dictionary  XV  and  are  given 
an  arbitrary  sequence  of  insert  and  match  operations ,  on-line.  The  zth  insert  operation 
is  insert{fp  x)  where  pattern  V  has  length  gt.  It  involves  adding  V  to  the  dictionary 
The  resulting  dictionary  is  XV  The  j th  match  operation  is  matchftj^'Dii)  which 
involves  matching  the  text  tj  of  rij  characters,  into  XV  where  there  are  exactly  i'  inserts 
in  the  sequence  of  operations  before  the  jth  match.  The  output  for  each  location  k  in  tj, 
is  the  longest  pattern  from  XV  that  matches  at  k.  Each  dictionary  X^  has  size  Mi  with  a 
longest  pattern  of  length  rnt . 

6.1.1  Partly  Dynamic  Prefix-Matching 

First  we  consider  the  prefix-matching  version  of  the  problem.  In  partly  dynamic  prefix- 
matching^  the  output  for  the  operation  match(tj,T>ii)  is  for  each  text  location  tj(k ),  the 
longest  prefix  from  XV  which  occurs  beginning  at  tj(k).  We  modify  our  algorithm  in 
Section  4.1  to  perform  partly  dynamic  prefix-matching. 

Our  modifications  to  the  algorithm  in  Section  4.1  concern  namestamping.  We  first 
define  the  partly  dynamic  namestamping  problem;  namestamping  from  Section  3  is  hence¬ 
forth  called  standard  namestamping.  A  set  of  tuples  Sq  is  initially  processed  as  in  stan¬ 
dard  namestamping.  A  sequence  of  two  operations  can  be  performed.  The  operation 
insert(S ,  St)  is  the  i  +  1st  insertion  in  which  a  set  S  of  tuples  is  added  to  Si  as  follows: 
the  elements  in  S  are  assigned  namestamps  consistent  with  those  in  Si.  That  is,  if  an  ele¬ 
ment  x  in  S  is  in  S',-,  then  l(x)  is  unchanged.  The  elements  in  S  and  not  in  S,  are  assigned 
namestamps  as  defined  in  Section  3.  Following  this  operation,  S[jSi  =  S,-+ 1.  The  opera¬ 
tion  namestamp(A ,  Si)  is  performed  after  the  zth  insertion  and  before  the  i  +  1st,  and  it 
namestamps  A  with  Si  as  in  standard  namestamping.  We  claim  that  by  modifying  the 
standard  namestamping  procedure  in  Section  3  slightly,  the  following  bounds  are  achieved 
for  partly  dynamic  namestamping:  So  is  processed  in  0(1)  time  using  0(|So|)  processors 
following  which  insert(S,  Si)  takes  0(1)  time  using  |S|  processors  and  namestamp(A,  Si) 
takes  0(1)  time  using  0(|A|)  processors. 

We  now  return  to  partly  dynamic  prefix  matching.  The  initial  dictionary  is  prepro¬ 
cessed  as  in  the  static  prefix-matching  algorithm  in  Section  4.1  and  the  insertifp ,X)4_1) 
operation  is  performed  by  simulating  this  algorithm  for  dictionary  processing  on  V.  In 
both  these  steps,  all  standard  namestamping  procedures  are  replaced  by  partly  dynamic 
namestamping.  Recall  that  standard  namestamping  steps  were  involved  in  Section  4.1 
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ill  prefix-naming,  naming  (shrink-and-spawn),  Extend- Right  and  retrieve-index  problems 
(Phase  2). 


Theorem  7  The  initial  dictionary  T> 0  in  which  the  longest  -pattern  is  of  length  at  most 
m0  can  be  processed  in  O(log??i0)  time  and  O(M0 )  work.  Following  this,  the  insert{V ,'A) 
operation  can  be  performed  in  O(logp4)  time  and  O(gi)  work.  A  match{tj,T>ii )  is  per¬ 
formed  in  O(log??i4/)  time  and  0(nj  log  7?^/)  work.  For  each  location  tj(k),  the  output  is 
the  longest  prefix  from  T>i/  that  matches  beginning  at  tj(k). 


A  detail  concerns  dynamic  space  allocation.  Note  that  as  patterns  get  inserted,  the 
size  of  the  arrays  needed  to  perform  namestamping  increases.  Such  a  situation  arises  in 
the  parallel  construction  of  2 — 3  trees  [PVW83]  in  which  it  is  assumed  that  an  array  of 
infinite  size  is  provided.  The  2 — 3  tree  is  maintained  at  the  beginning  of  this  array  dy¬ 
namically.  However,  we  feel  that  this  is  not  a  realistic  assumption,  especially  in  our  case 
where  we  are  concerned  the  availability  of  several  two  dimensional  arrays.  We  provide 
the  following  solution  for  this  problem. 

An  amortized  solution  is  immediate:  given  the  initial  dictionary  size  M0,  procure  ta¬ 
bles  of  size  2M0  X  2M0  for  namestamping.  Now,  patterns  of  total  size  M0  can  be  added  to 
this  initial  dictionary  using  the  algorithm  outlined  above.  When  the  size  of  the  inserted 
patterns  goes  beyond  M0,  fresh  tables  of  size  4 M0  X  4 M0  are  procured.  The  old  tables 
are  copied  into  this  larger  table  (takes  O(M0)  work  corresponding  to  O(M0 )  total  work 
done  so  far  constructing  this  table).  Now,  the  algorithm  proceeds  as  earlier  with  the  new 
tables.  The  old  tables  are  discarded.  Various  complexity  bounds  stated  in  the  previ¬ 
ous  theorem  remain  unchanged  except  that  the  bounds  for  insertions  become  amortized 
bounds.  An  important  detail  concerns  copying  the  old  tables  into  new  tables;  copying 
is  done  by  simulating  the  dictionary  processing  algorithm,  on  the  new  table  with  old 
dictionary,  rather  than  entry-by-entry  copy  of  the  tables.  Note  that  this  is  particularly 
critical  since  copying  the  tables  entry  by  entry  would  result  in  quadratic  work. 

This  amortized  bound  can  be  made  worst  case  bound  using  a  variation  of  standard 
techniques  for  dynamizing  data  structures  [083].  Essentially,  the  technique  is  the  follow¬ 
ing:  assume  that  tables  of  size  2 M0  X  2 M0  have  been  used  for  processing  the  dictionary 
initially.  Dynamic  insert  operations  are  performed  till  patterns  of  total  size  M0  are  in¬ 
serted  into  the  dictionary.  Following  this,  tables  of  size  4 M0  x4 M0  are  procured.  Without 
copying  the  old  tables  into  the  new  tables,  the  algorithm  continues  working  on  the  new 
table.  When  any  insert  is  processed,  two  operations  are  performed,  namely,  inserting 
into  the  new  table  as  dictated  by  the  algorithm  being  careful  to  read  any  relevant  entries 
in  the  old  table  and  copying  portions  of  the  old  table  into  the  new  table.  Again  copying 
is  done  as  in  the  amortized  case,  i.e.,  by  simulating  the  dictionary  processing  algorithm. 
When  new  patterns  of  total  size  M0  are  inserted,  the  old  tables  would  have  been  fully 
copied  onto  the  new  tables  and  therefore,  the  old  tables  can  be  discarded.  Hence,  the 
previous  theorem  holds  with  worst  case  bounds. 
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Our  description  above  carries  over  to  the  case  when  several  pattern  strings  are  inserted 
simultaneously. 

6.1.2  Partly  Dynamic  Dictionary  Matching 

Following  [AFM92,AF91,F93],  a  parallel  algorithm  for  the  following  problem  is  easily 
derived:  given  a  prefix  from  the  dictionary  that  is  dynamic  under  insert  operations,  de¬ 
termine  its  longest  prefix  that  is  a  pattern.  Essentially,  the  following  simple  subset  of  the 
techniques  developed  in  [AFM92]  is  sufficient.  Maintain  a  trie  of  the  pattern  strings.  Ini¬ 
tially,  the  suffix  tree  of  the  dictionary  can  be  modified  to  serve  as  the  trie.  Certain  nodes 
in  the  trie  are  marked  corresponding  to  the  patterns  in  the  dictionary.  The  query  con¬ 
cerning  the  longest  prefix  of  a  given  prefix  that  is  a  pattern  in  the  dictionary,  translates 
to  determining  the  nearest  marked  ancestor  in  this  trie.  To  maintain  this  information 
dynamically,  they  maintain  the  Euler  tour  of  this  trie  in  a  balanced  tree. 

Inserting  a  new  pattern  is  performed  as  follows.  From  the  description  of  the  algorithm 
for  partly  dynamic  prefix-matching,  it  follows  that  the  longest  prefix  of  the  new  pattern 
that  is  present  in  the  current  dictionary  is  available.  This  information  can  readily  be 
obtained  as  a  pointer  to  the  node  in  the  trie  with  the  corresponding  prefix.  At  this  node, 
the  new  pattern  is  inserted.  In  [AFM92],  it  is  described  how  to  maintain  the  Euler  tour 
when  patterns  are  inserted.  Given  this  algorithm,  our  result  from  Theorem  7  immediately 
yields, 

Theorem  8  Processing  'P0  takes  O(log  Af0)  time  and  O(Af0  log  Af0)  work.  The  oper¬ 
ation  insert  ('P,T)4_1)  takes  0(logA'Ii_1)  time  and  0(fii  log  work.  The  operation 

match(tj ,Dii)  takes  O(logM)/)  time  and  0(rij  log  Mti)  work.  The  output  for  each  text 
location  is  the  longest  pattern  that  matches  beginning  at  that  location. 

Once  again,  a  detail  concerns  dynamic  space  allocation  for  maintaining  the  tries.  As 
mentioned  in  the  Section  6.1.1,  an  amortized  solution  is  immediate;  this  can  be  made 
into  a  worst  case  solution  using  standard  methods  [083]. 


6.2  Fully  Dynamic  Dictionary  Matching 

The  ft  illy  dynamic  dictionary  matching  problem  admits  the  operation  of  deleting  patterns 
from  the  dictionary  in  addition  to  those  supported  by  the  partly  dynamic  dictionary 
matching  problem.  Let  the  kth.  operation  that  modifies  the  dictionary  (through  inser¬ 
tions  and  deletions)  be  deleteifp ,  T>k-i)  where  V  has  length  ?]k .  This  operation  involves 
removing  the  pattern  V  from  Pk-i  resulting  in  dictionary  “Df..  We  modify  our  static 
dictionary  matching  algorithm  from  Section  4. 

When  a  delete  operation  is  performed,  the  pattern  to  be  deleted  is  not  removed  from 
the  dictionary.  It  is  simply  “marked”.  When  the  total  size  of  the  patterns  in  the  dictio¬ 
nary  falls  below  a  fraction  of  the  size  of  the  tables,  the  marked  patterns  are  squeezed  out 
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using  a  fast  prefix-sum  computation  [C V 89]  and  the  various  tables  are  rebuilt.  Hence,  the 
work  bounds  for  deletions  are  amortized.6  Again  we  first  consider  fully  dynamic  prefix¬ 
matching  in  which  the  output  for  the  operation  matchfitj  ,XV)  is  for  each  text  location 
tj{k ),  the  longest  prefix  from  Dti  which  occurs  beginning  at  tj(l'). 

6.2.1  Fully  Dynamic  Prefix-Matching 

We  describe  modifications  to  our  algorithm  in  Section  4.1.  Our  modifications  concern 
namestamping  that  yield  more  sophisticated  variations  to  be  used  in  the  fully  dynamic 
case.  This  is  because,  deleting  patterns  brings  up  issues  not  encountered  earlier.  Re¬ 
call  from  Section  3  that  to  namestamp  with  S i,  each  tuple  (x,y)  in  Si  is  assigned  a 
namestamp  fix).  When  a  tuple  with  element  x  is  deleted,  we  need  to  change  fix).  Two 
possibilities  occur:  if  deleting  x  leaves  no  tuples  in  S i  with  element  x,  fix)  has  to  be 
cleared.  On  the  other  hand,  if  deleting  x  leaves  some  tuples  in  Si  with  element  x,  no 
changes  need  be  made  provided  we  are  not  particular  about  the  namestamp.  To  ensure 
this,  we  need  to  only  keep  track  of  the  number  of  tuples  that  have  the  same  element. 
To  satisfy  this  requirement,  we  define  below  a  variation  of  namestamping  called  dynamic 
stamp- counting.  Assuming  now  that  we  are  particular  about  the  namestamp  fix),  it  is 
not  sufficient  to  keep  track  of  the  number  of  tuples  with  an  element  x:  we  need  to  addi¬ 
tionally  keep  track  of  the  namestamps  of  these  tuples.  Corresponding  to  this,  we  define 
a  variant  of  namestamping  called  dynamic  stamp-listing. 

Now  we  formally  define  two  variants  of  namestamping.  Consider  dynamic  stamp¬ 
counting.  Initially  we  are  given  a  set  of  tuples  So.  To  each  distinct  element  x  in  So,  we 
assign  namestamp  fix)  as  in  standard  namestamping.  In  addition,  we  assign  to  x  a  count 
c(x)  of  the  number  of  tuples  in  So  with  element  x.  The  following  sequence  of  operations 
is  performed:  (*)  inserfis ,  Si),  adds  a  tuple  s  to  Si  and  S4  IJ  S  =  S4+ 1  as  defined  in  partly 
dynamic  namestamping,  (ii)  deleters,  S4),  removes  a  tuple  s  from  S4-  and  S4-  —  S  =  S4+i. 
(Hi)  namestamp(A,  Sf),  i.e.,  namestamp  set  A  as  in  standard  namestamping.  Dynamic 
stamp-listing  is  defined  analogous  to  dynamic  stamp-counting  by  replacing  c(x)  with  p(x) 
which  is  a  pointer  to  a  list  of  all  tuples  (aq,  yfi)  £  S4-  such  that  x  —  aq. 

In  our  applications,  all  the  integers  in  dynamic  namestamping  would  be  in  the  range 
[1  •  •  •  M]  where  M  is  the  dictionary  size.  We  claim  that  dynamic  stamp-counting  is  ex¬ 
actly  as  hard  as  the  integer-sort,  problem  [BDHPRS91],  that  is,  sorting  M  numbers  in 
the  range  [1  •  •  •  M],  Obviously  if  the  elements  of  So  are  sorted,  the  namestamps  and 
counts  can  be  assigned  in  a(M)  time  and  0(M)  work  using  the  standard  algorithm  to 
find  nearest  1,  for  each  0,  in  a  boolean  array  [Rag90].  Also,  if  stampcounting  the  set  So  is 
accomplished,  the  set  of  tuples  in  So  can  be  sorted  using  prefix-sum  computations  which 
take  Q( ^M)  time  and  O(M)  work  [CV89].  Hence,  stamp-counting  is  at  least  as  hard 

6We  use  a  simple  amortization  here.  Assume  a  sequence  of  operations  the  ith  of  which  involves  Wi 
work.  After  a  sequence  such  that  ^  Wi  —  cM  for  a  constant  fraction  c,  we  have  an  operation  that  takes 
f(M )  work  for  some  function  /.  The  amortized  work  for  the  ith  operation  in  this  case  is  defined  as 
( Wi/cM )  x  f(M). 
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as  providing  optimal  algorithms  for  integer  sorting.  Note  that  no  deterministic  optimal 
algorithms  are  known  for  the  integer-sort  problem. 

Following  this  claim,  dynamic  stamp-counting  can  be  done  deterministically  within 
following  bounds:  1.  processing  So  in  O(log  |So|/  log  log  |50|)  time  and  O(|S0|  log  log  |50|) 
work  [BDHPRS91].  2.  insert(s,Si )  and  deleters,  Si)  in  0(1)  time  and  0(1)  work.  3. 
namestamp(A,  Si)  in  0(1)  time  and  0(|A|)  work.  Dynamic  stamp-listing  can  be  per¬ 
formed  using  integer-sorting  using  space  quadratic  in  M  when  the  sets  are  tuples  with 
two  elements,  each  in  the  range  1  •  •  •  M.  as  follows.  Essentially  keep  a  double  linked  list  of 
the  tuples  with  a  particular  stamp  on  top  of  an  array  of  size  M  for  each  stamp.  In  all  our 
applications  of  dynamic  stamp-listing,  the  tuples  have  only  two  elements;  hence,  the  space 
utilized  is  at  most  quadratic  in  the  dictionary  size.  It  follows  that  the  abovementioned 
bounds  are  achievable  for  dynamic  stamp-listing  as  well. 

We  now  return  to  fully  dynamic  prefix-matching.  As  in  partly  dynamic  prefix- 
matching,  we  simulate  parts  of  the  static  prefix-matching  algorithm.  Again  the  vari¬ 
ous  standard  namestamping  steps  are  replaced  by  the  appropriate  variants  as  follows: 
(i)  namestamping  in  prefix-naming  and  naming  (shrink-and-spawn  step)  are  replaced  by 
partly  dynamic  namestamping,  (ii)  namestamping  in  Extend- Right  is  replaced  by  dy¬ 
namic  stamp-counting,  and  (iii)  namestamping  in  the  retrieve-index  problem  (Phase  2) 
is  replaced  by  stamp-listing. 

Theorem  9  The  initial  dictionary  T>0  can  be  processed  in  0(log  M0/ log  log  M0)  time  and 
O(M0  log  log  Mo)  work  deterministically.  The  insert(V ,2A)  operation  can  be  performed 
in  0(logfii)  time  and  0(/q)  work.  The  delete(V ^T>k_1)  operation  can  be  performed  in 
0{logMk/\oglogMk)  time  and  0(rjk  log  log  Mi)  amortized  work.  A  matchftj^'Vii)  is  per¬ 
formed  in  0(logm4/)  time  and  0{rij  log  m,-/)  work. 

Consider  inserting  or  deleting  several  pattern  strings  simultaneously.  The  following 
modifications  are  incorporated.  The  various  tuples  which  are  simultaneously  inserted 
or  deleted  are  sorted  using  the  integer-sorting  algorithm.  In  dynamic  name-listing,  the 
list  of  tuples  with  the  same  stamp  are  kept  in  a  2 — 3  tree  which  can  be  updated  as 
in  [PVW83].  Dynamic  stamp-counting  and  stamp-listing  with  these  modifications  yield 
dynamic  prefix-matching  algorithms  within  bounds  slightly  worse  than  those  cited  in 
Theorem  9. 

6.2.2  Fully  Dynamic  Dictionary  Matching 

Combining  our  result  for  the  fully  dynamic  prefix  matching  problem  with  techniques 
of  Amir,  Faracli  and  Mafias  [AFM92,F93],  an  algorithm  for  fully  dynamic  dictionary 
matching  is  derived.  As  in  partly  dynamic  dictionary  matching,  a  trie  of  pattern  strings 
is  maintained.  In  addition,  UNION/FIND  operations  are  implemented  on  the  set  of 
marked  nodes  in  the  trie.  This  keeps  track  of  the  available  marked  ancestors  for  each 
marked  node  as  patterns  get  deleted.  To  sum, 
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Theorem  10  The  dictionary  P 0  is  processed  in  O(log  M0)  time  and  O(M0logM0)  work. 
The  insertiV operation  takes  O(log M8_i )  amortized  time  and  0(ni  log M;_i)  amor¬ 
tized  work.  The  operation  delete{V ,'Dk_1)  can  be  performed  in  O(log  M^-\)  amortized 
time  and  0{rg.  log  Afyt-i)  amortized  work.  The  operation  match  {t^fDi')  takes  0(logM;/ ) 
time  and  0(rij  log  Mti)  work. 


7  Multi-pattern  String  Matching  and  Related  Prob¬ 
lems 

We  modify  the  algorithm  for  static  dictionary  matching  in  Section  4  to  derive  simple 
optimal  algorithms  when  all  the  patterns  in  the  dictionary  are  of  the  same  length;  this 
is  the  multi-pattern  string  matching  problem  of  [KLP89].  In  this  process  we  make  some 
basic  observations  which  we  believe  are  critical  in  using  the  shrink-and-spawn  technique 
to  derive  optimal  speed-up  algorithms,  even  when  the  overall  work  is  linear  in  the  input 
size.  The  main  idea  is  similar  in  spirit  to  that  in  Section  4.4  where  we  discussed  issues 
in  shrinking  the  text  sizes  by  a  fraction  recursively.  When  all  patterns  are  of  the  same 
length,  this  step  and  the  corresponding  step  of  extending  the  match  to  the  left  when 
returning  from  lower  levels  of  recursion  can  be  achieved  efficiently.  The  details  are  as 
follows. 

Let  T  be  the  set  of  text  strings  and  let  P  denote  the  set  of  pattern  strings  each  of 
length  m.  The  total  size  of  P  is  M.  Unlike  in  Section  4  and  other  earlier  sections  where 
we  first  solve  the  prefix-matching  version  of  the  problem,  in  this  section  we  solve  the  dic¬ 
tionary  matching  problem  directly.  Also  as  it  turns  out,  we  do  not  need  the  prefix-names 
for  the  strings  in  P.  The  name  of  a  pattern  string,  defined  to  be  the  prefix-name  of  the 
last  location  in  it,  is  sufficient.  For  reasons  to  be  specified  later,  we  maintain  a  stronger 
recursive  invariant  of  generating  the  names  at  each  recursive  level,  for  the  appropriate  set 
of  strings  derived  from  the  dictionary  at  that  level;  recall  that  in  Section  4,  the  prefix- 
names  were  computed  in  one  step  that  preceded  the  recursive  prefix-matching  step.  Our 
algorithm  proceeds  as  follows: 

Step  1  (Optimal  Shrink-and-spawn)  Let  V  be  obtained  from  P  by  replacing  each  Pj  6  P 
by  the  following  two  strings:  PJ,  a  suffix  obtained  by  dropping  the  leading  symbol  in  Pj 
and  PJ,  a  prefix  obtained  by  dropping  the  last  symbol  in  Pj.  All  strings  in  V  are  of  same 
length.  Let  8'  be  a  naming  function  for  all  the  substrings  of  length  4  in  T  and  V .  Shrink 
each  string  in  V  by  4  to  get  V .  The  residues  of  length  at  most  3  would  be  considered 
later.  Spawn  4  copies  of  each  string  in  T.  Note  that  for  a  string  i  6  T,  the  four  copies  are 
V  for  1  <  i  <  4  (See  the  definition  of  shrink-and-spawn  from  Section  3).  Delete  alternate 
strings  t2  and  t4  for  each  t  £  T.  Let  the  resultant  set  be  T' .  Note  that  the  factor  by 
which  the  text  is  spawned  is  half  the  factor  by  which  the  dictionary  strings  are  shrunk. 
As  a  result,  V  is  of  size  2  X  M/4  =  M/2  and  T'  is  of  size  n/2. 

Step  2.  (Recursive  Step)  Recursively,  solve  the  dictionary  matching  problem  on  P'  and 
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T' .  This  returns  a  name  for  each  string  p  £  P'  denoted  by  6{p).  In  addition,  for  each 
location  i  in  each  Tj  £  T' ,  this  returns  the  name  of  the  pattern  in  P'  that  matches  be¬ 
ginning  at  that  location.  Equivalently,  for  each  odd  text  location  Tj(i)  £  T,  the  name 
of  the  pattern  from  P'  which  matches  at  that  location  has  been  determined.  Let  this  be 
denoted  by  a(Tj(i)). 

Recall  that  in  Section  4.1,  prefix  names  were  computed  for  P.  At  each  level  in  re¬ 
cursion,  the  prefix  names  for  a  new  set  of  patterns  for  lower  levels  of  recursion  could 
be  gleaned  from  those  of  T>  because  the  new  set  of  patterns  were  prefixes  of  T>.  In  our 
recursive  step  here,  not  all  strings  in  P'  are  prefixes  of  P.  Hence,  names  for  strings  in  P' 
can  not  be  directly  derived  from  those  of  P.  One  approach  towards  generating  the  names 
for  P'  is  performing  a  prefix-naming  computation  at  each  stage  of  recursion.  However, 
each  stage  would  then  take  01og??r)  time  leading  to  an  overall  algorithm  that  works  in 
O(log2  m)  time.  Our  approach  here  has  been  to  embed  the  computation  of  the  names 
for  the  pattern  strings  within  the  recursive  framework  by  strengthening  the  recursive 
invariant. 

Step  3.  In  this  step,  we  perform  extension  to  the  right  and  extension  to  the  left  as  in 
Section  4.1  and  Section  4.4.  In  addition,  we  generate  the  prefix-names  for  the  strings  in 
the  set  P  so  the  recursive  invariant  is  maintained. 

Step  3a.  We  find  the  names  for  the  strings  in  P.  Consider  each  string  Pj  £  P.  Let 
the  string  obtained  from  Pj5  by  shrinking  by  4  be  Pj.  The  name  6(Pj)  is  known.  From 
S(Pf),  we  compute  j3(Pj ),  the  name  for  Pj  £  P.  Let  r(Pj3)  denote  the  residue  when  PJ 
is  shrunk  by  a  factor  of  4.  The  residue  is  of  length  at  most  3.  The  residue  for  each 
pattern  is  of  same  length  denoted  by  1Z.  Assume  that  all  substrings  in  T  and  P  of  length 
1Z  have  been  named  using  the  function  S'.  Now,  for  each  Pj  £  P,  generate  the  tuple 
(6(PJ).  b'(r(Pj)).  P.(  |P;.|)).  Name  this  set  of  tuples.  The  result  gives  the  name  / 3(Pj )  for 

Pi  e  P. 

Step  3b.  For  each  odd  position  in  the  text,  we  find  the  index  of  the  pattern  that 
matches  beginning  at  that  position.  As  in  Step  3a,  corresponding  to  each  Pj  £  P,  gener¬ 
ate  a  tuple  of  the  form  {{^(Pj),P(r(Pj)),  Pj(|Pjf|),  /3(Pj)).  Let  this  be  set  Sp.  Form  set 
St  by  generating,  for  all  odd  i  in  each  Pj, 

(a(Tj (*)),;  P(Tj(i  +  \a(Tj(i)\  +  1)  •  •  •  Tj(i  +  \a(Tj(i)\  +  1Z)%  Tj(i  +  |a(Tj(*)|  +  7Z  +  1)}. 

Namestamp  St  with  Sp.  The  stamp  for  the  tuple  corresponding  to  Tj(i)  gives  the 
name  of  the  pattern  from  P  that  matches  Tj  beginning  at  i  where  i  is  odd. 

Step  3c.  (Extend-Left)  For  each  even  position  in  the  text,  we  determine  the  pattern 
that  matches  beginning  at  that  position,  by  extending  the  pattern  from  P'  that  matches 
at  the  neighbor  to  the  right.  For  each  Pj  £  P,  b(Pj)  denotes  the  name  of  the  string  in  P' 
generated  from  Pj  by  dropping  the  leading  symbol  and  shrinking  the  resultant  string  by 
a  factor  of  4.  Let  the  residue  of  Pj5  be  denoted  by  r(Pj).  All  such  residues  are  of  equal 
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length,  same  as  in  Step  3a,  denoted  by  R.  As  in  Step  3a,  corresponding  to  each  Pj  E  'P, 
generate  tuples  of  the  form  ((P,(l),  h(Pf),  <5'(r(PJ))),  /3(Pj)  ).  Let  this  be  set  Sp.  Form 
set  St  by  generating,  for  all  even  i  in  each  2), 

(  Tj(i), a(Tj(i  +  lfj  6'(Tj(i  +  |a(Tj(i)|  +  1)  •  •  •  T,(,  +  KT, (i)|  +  ft))). 

As  in  Step  36,  namestamp  set  St  with  Sp.  The  stamp  for  the  tuple  corresponding  to 
Tj(i )  gives  the  name  of  the  pattern  from  P  that  matches  Tj  beginning  at  i  where  i  is 
even. 

That  completes  the  description  of  our  algorithm.  Step  1  takes  0(1)  time  and  0(n-\-M) 
work.  Let  0T(n,M,m)  and  0W(n,  M,m)  represent  the  time  and  work  complexity  of 
our  algorithm  given  text  size  n,  dictionary  size  M  such  that  each  pattern  is  of  length 
m.  Step  2  takes  0T(n/2,  M/2,  m/4)  time  and  0W(n/2,  M/2,  m/4)  work.  Step  3a  takes 
0(1)  time  and  0(M )  work.  Steps  3 b  and  3c  each  take  0(1)  time  and  0(n  +  M)  work. 
Hence, 


0T(n,  M,m)  <  0T(n/2,  M/2,  ??r/4)  +  1. 
0W(n,M,  m)  <  0lb(n/2,  Af/2,  m/4)  +  n  +  M. 


Theorem  11  Given  a  text  of  size  n,  a  dictionary  of  size  M  consisting  of  patterns  each 
of  size  in,  static  dictionary  matching  can  be  performed  in  O(log  in)  time  and  0(n  +  M) 
work. 

This  result  also  yields  optimal  algorithms  for  multi-dimensional  pattern  matching  and 
other  problems  such  as  suffix-prefix  matching  studied  in  [KLP89,Rab93].  Rabin  [Rab93] 
provides  randomized  algorithms  for  these  problems.  Our  algorithms,  as  those  in  [KLP89] 
are  deterministic  and  are  much  simpler  and  as  efficient  as  the  ones  presented  there. 
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